camera extrinsic
Stable Offline Hand-Eye Calibration for any Robot with Just One Mark
Xie, Sicheng, Meng, Lingchen, Du, Zhiying, Tu, Shuyuan, Cao, Haidong, Leng, Jiaqi, Wu, Zuxuan, Jiang, Yu-Gang
Imitation learning has achieved remarkable success in a variety of robotic tasks by learning a mapping function from camera-space observations to robot-space actions. Recent work indicates that the use of robot-to-camera transformation information ({\ie}, camera extrinsics) benefits the learning process and produces better results. However, camera extrinsics are oftentimes unavailable and estimation methods usually suffer from local minima and poor generalizations. In this paper, we present CalibAll, a simple yet effective method that \textbf{requires only a single mark} and performs training-free, stable, and accurate camera extrinsic estimation across diverse robots and datasets through a coarse-to-fine calibration pipeline. In particular, we annotate a single mark on an end-effector (EEF), and leverage the correspondence ability emerged from vision foundation models (VFM) to automatically localize the corresponding mark across robots in diverse datasets. Using this mark, together with point tracking and the 3D EEF trajectory, we obtain a coarse camera extrinsic via temporal Perspective-n-Point (PnP). This estimate is further refined through a rendering-based optimization that aligns rendered and ground-true masks, yielding accurate and stable camera extrinsic. Experimental results demonstrate that our method outperforms state-of-the-art approaches, showing strong robustness and general effectiveness across three robot platforms. It also produces useful auxiliary annotations such as depth maps, link-wise masks, and end-effector 2D trajectories, which can further support downstream tasks.
VROOM - Visual Reconstruction over Onboard Multiview
Yadav, Yajat, Bharadwaj, Varun, Korrapati, Jathin, Baranwal, Tanish
W e introduce VROOM, a system for reconstructing 3D models of F ormula 1 circuits using only onboard camera footage from racecars. Leveraging video data from the 2023 Monaco Grand Prix, we address video challenges such as high-speed motion and sharp cuts in camera frames. Our pipeline analyzes different methods such as DROID-SLAM, AnyCam, and Monst3r and combines preprocessing techniques such as different methods of masking, temporal chunking, and resolution scaling to account for dynamic motion and computational constraints. W e show that Vroom is able to partially recover track and vehicle trajectories in complex environments. These findings indicate the feasibility of using onboard video for scalable 4D reconstruction in real-world settings.
Phantom: Training Robots Without Robots Using Only Human Videos
Lepert, Marion, Fang, Jiaying, Bohg, Jeannette
Our method enables training robot policies without collecting any robot data. We first collect human video demonstrations in diverse environments and use inpainting to remove the human hand. A rendered robot is then inserted into the scene using the estimated hand pose. The resulting augmented dataset is used to train an imitation learning policy, which is deployed zero-shot on a real robot. Abstract --Scaling robotics data collection is critical to advancing general-purpose robots. Current approaches often rely on teleoperated demonstrations which are difficult to scale. We propose a novel data collection method that eliminates the need for robotics hardware by leveraging human video demonstrations. By training imitation learning policies on this human data, our approach enables zero-shot deployment on robots without collecting any robot-specific data. T o bridge the embodiment gap between human and robot appearances, we utilize a data editing approach on the input observations that aligns the image distributions between training data on humans and test data on robots. Our method significantly reduces the cost of diverse data collection by allowing anyone with an RGBD camera to contribute. We demonstrate that our approach works in diverse, unseen environments and on varied tasks. I NTRODUCTION Data scarcity remains a key challenge in advancing robotics research. While large-scale data collection efforts are gaining momentum, even the largest robotics datasets [1, 7] are significantly smaller than those used to train generalist models in natural language processing and computer vision. These efforts are constrained by the slow and costly process of collecting data with robotics hardware.
GSAVS: Gaussian Splatting-based Autonomous Vehicle Simulator
Modern autonomous vehicle simulators feature an ever-growing library of assets, including vehicles, buildings, roads, pedestrians, and more. While this level of customization proves beneficial when creating virtual urban environments, this process becomes cumbersome when intending to train within a digital twin or a duplicate of a real scene. Gaussian splatting emerged as a powerful technique in scene reconstruction and novel view synthesis, boasting high fidelity and rendering speeds. In this paper, we introduce GSAVS, an autonomous vehicle simulator that supports the creation and development of autonomous vehicle models. Every asset within the simulator is a 3D Gaussian splat, including the vehicles and the environment. However, the simulator runs within a classical 3D engine, rendering 3D Gaussian splats in real-time. This allows the simulator to utilize the photorealism that 3D Gaussian splatting boasts while providing the customization and ease of use of a classical 3D engine.
Autonomous Field-of-View Adjustment Using Adaptive Kinematic Constrained Control with Robot-Held Microscopic Camera Feedback
Lin, Hung-Ching, Marinho, Murilo Marques, Harada, Kanako
However, the limited field-of-view (FoV) of the microscopic camera necessitates camera motion to capture a broader workspace environment. In this work, we propose an autonomous robotic control method to constrain a robot-held camera within a designated FoV. Furthermore, we model the camera extrinsics as part of the kinematic model and use camera measurements coupled with a U-Net based tool tracking to adapt the complete robotic model during task execution. As a proof-of-concept demonstration, the proposed framework was evaluated in a bi-manual setup, where the microscopic camera was controlled to view a tool moving in a pre-defined trajectory. The proposed method allowed the camera to stay 99.5% of the time within the real FoV, compared to 48.1% without the proposed adaptive control.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States (0.04)
- Media > Photography (0.48)
- Health & Medicine > Surgery (0.46)
- Media > Television (0.34)
- Media > Film (0.34)